pandas Used for data manipulation and analysis. numpy Provides support for numerical operations. StandardScaler Standardizes features by removing the mean and scaling to unit variance. KMeans Implements the KMeans clustering algorithm from scikit-learn. matplotlib.pyplot For plotting graphs and charts. seaborn A statistical plotting library built on top of matplotlib. Makes plots more attractive and informative. display From IPython; used to render and visually format data (e.g., DataFrames) in Jupyter notebooks.
-->features = ['RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe'] Selects the features to be used for clustering from the dataset. -->X = df[features] Extracts the selected features from the DataFrame into a new variable X. -->scaler = StandardScaler() Initializes a scaler to normalize the feature values. -->X_scaled = scaler.fit_transform(X) Fits the scaler on the data and transforms X to have mean 0 and standard deviation 1. -->kmeans = KMeans(n_clusters=6, random_state=42, n_init=10) Initializes KMeans clustering with 6 clusters and fixed randomness for reproducibility. -->df['Cluster'] = kmeans.fit_predict(X_scaled) Performs KMeans clustering and assigns the resulting cluster labels to a new column in the DataFrame. -->count_matrix = pd.crosstab(df['GlassType'], df['Cluster'], margins=True, margins_name='Total') Creates a contingency table of counts between true glass types and predicted clusters, adding row/column totals. -->total_counts = count_matrix.loc['Total'] Extracts the total number of samples per cluster from the last row of the matrix. -->percentage_matrix = count_matrix.div(total_counts).mul(100).round(1) Converts the counts into percentages relative to each cluster column. -->combined_matrix = count_matrix.copy().astype(str) Makes a copy of the count matrix and converts values to strings for formatting. -->for col in count_matrix.columns: for idx in count_matrix.index: count = count_matrix.loc[idx, col] percent = percentage_matrix.loc[idx, col] combined_matrix.loc[idx, col] = f"{count} ({percent}%)" Loops through each cell to combine the count and percentage into a single string like "12 (30.0%)". -->print("\nCLASS-CLUSTER DISTRIBUTION MATRIX") Prints a heading to describe the following table. -->print("(Format: Count (Percentage%))") Prints a legend to explain the format of the combined matrix. -->styled_combined = combined_matrix.style.set_caption("Combined Class-Cluster Distribution (Count with Percentages)").set_table_styles([ {'selector': 'caption', 'props': [('font-size', '16px'), ('font-weight', 'bold'), ('text-align', 'center')]}, {'selector': 'th', 'props': [('text-align', 'center')]}, {'selector': '.row_heading, .col_heading', 'props': [('font-weight', 'bold')]} ]) Formats the combined matrix for display with a styled caption and header formatting. -->display(styled_combined) Displays the styled combined matrix in a rich format (e.g., in Jupyter notebooks). -->styled_counts = count_matrix.style.background_gradient( cmap='Blues', subset=pd.IndexSlice[count_matrix.index.drop("Total"), count_matrix.columns.drop("Total")] ).set_caption("Raw Count Matrix with Gradient Coloring") Creates a styled version of the raw count matrix using a blue gradient for better visual insight. -->display(styled_counts) Displays the styled count matrix with gradient coloring. CODE FOR ALL THE 10 DIFFERENT SEEDS WITH PERCENTAGE